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We derive an equality for non-equilibrium statistical mechanics. The equality concerns the worst- 
case work output of a time-dependent Hamiltonian protocol in the presence of a Markovian heat 
bath. It has the form “worst-case work = penalty - optimum.” The equality holds for all rates of 
changing the Hamiltonian and can be used to derive the optimum by setting the penalty to 0. The 
optimum term contains the max entropy of the initial state, rather than the von Neumann entropy, 
thus recovering recent results from single-shot statistical mechanics. We apply the equality to an 
electron box. 


General Introduction —Average values of quantities 
are not always typical values: Outcomes may fluctuate 
significantly. In non-equilibrium nano and quantum sys¬ 
tems this is often the case, with, for example, the work 
output of a protocol having a significant probability of de¬ 
viating from the average. Hence, in these important sys¬ 
tems, statements about averages have limited use when it 
comes to predicting what will happen in any given trial; 
the fluctuations need to be discussed explicitly. 

Two key relations concerning fluctuations in work. 
Crooks’ Theorem [T] and Jarzynski’s Equality [2], have 
been studied extensively theoretically and experimen¬ 
tally. Amongst other things they can be used to de¬ 
termine free energies of equilibrium states from non¬ 
equilibrium experiments. 

A recently developed alternative approach to non¬ 
equilibrium statistical mechanics is single-shot statisti¬ 
cal mechanics [SHU], inspired by single-shot information 
theory min]. The focus is on statements that are guar¬ 
anteed to be true in every trial, rather than on average 
behaviors. For example, one can ask whether a process’s 
work output is guaranteed to exceed some threshold value 
(such as an activation energy), or whether a process’s 
work cost is guaranteed not to exceed some threshold 
value (beyond which the system may break from dissi¬ 
pating heat). These statements concern the worst-case 
work of a process. A key realisation is that the opti¬ 
mal worst-case work is determined not by the von Neu¬ 
mann/Shannon entropy of the initial state, but rather 
the max entropy, which is the logarithm of the number of 
non-zero eigenvalues of the density matrix. Thus, which 
entropy one should use in statements about optimal work 
depends on which property of the work probability dis¬ 
tribution one is interested in. 

Single-shot statistical mechanics began with almost no 
a priori relation to fluctuation theorems, but promising 
links were made in [6l [15]. We shall use two realiza¬ 
tions from namely that (i) in the trajectories model 


for work extraction, both single-shot and fluctuation re¬ 
sults apply; and (ii) Crooks’ Theorem can be used to 
make a certain statement about worst-case work. A nat¬ 
ural question that arose from these results is how to link 
Crooks’ Theorem to the existing single-shot statements 
concerning optimal work in terms of the entropy of the 
initial state. 

We here show that key expressions concerning optimal 
worst-case work from [21 O [5] follow from Crooks’ The¬ 
orem plus some extra thought. We moreover generalise 
them by giving an equality for the worst-case work that 
holds for any protocol in the set-up, including fast pro¬ 
tocols. The equality holds in every process in a general 
set-up that involves a time-varying Hamiltonian and a 
single Markovian heat bath, modelled using trajectories. 
It has the form ‘worst-case work=penalty-optimum,’ and 
the optimum can thus be derived by setting the penalty 
to zero. To make the link to physics clear, we apply the 
result to an electron box experiment [TMT^ . 

We begin with defining the set-up. 

One-shot relative entropies —The standard relative 
entropy is D{p\\(j) := —Tr(p[log/9— logcrj) [12], where 
log in this paper means the natural logarithm also known 
as In. This is part of a wider class of relative entropies 
known as the Renyi relative entropies, which are parame¬ 
terized by an integer a. We shall use two other members 
of that family: the (classical version of the) oo relative 
entropy Dac,{P\\Q) ■= sup 3 . 1 og(|^) and the 0 relative 
entropy DQ{p\\a) := —Tr(7rplogtr), wherein TTp projects 
onto the support of p m- These are called one-shot rel¬ 
ative entropies as they arise naturally in one-shot (also 
called single-shot) information theory [T21 [HI[^ . 
Protocols, trajectory model of —We now describe the 
theoretical model, using the notation of m- The physi¬ 
cal scenario we have in mind is depicted in Fig. [^ 

A protocol will be a sequence of elementary changes: 
(i) changes of the Hamiltonian and (ii) thermalizations. 
We shall initially assume there is a finite number of such 
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Figure 1: A schematic of the setup. There is a 
working-medium system, a battery system from which work 
is taken or given to, and a single heat bath. The battery 
system has the effect of altering the Hamiltonian of the 
working-medium, depicted with the blue arrow shifting an 
energy level. The heat bath has the effect of hopping the 
system between energy levels, depicted by the red arrow. 


steps (but later show that the continuum limit is well- 
defined and corresponds to a master equation, at least 
in the discrete-classical case). The Hamiltonian is pa¬ 
rameterized by Xm, where m is an integer that labels the 
step. 

1. Hamiltonian changes map Xm to A^+i. We 
follow m in supposing there is an energy measure¬ 
ment in the instantaneous energy eigenbasis at the be¬ 
ginning and end of each Hamiltonian-changing step. In 
a given realisation the system then evolves from \im, Xm) 
to \im,Xm+i), where im labels the energy eigenstate. 
This costs work given by the energy difference: Wm = 
E{\im, Am-i-i)) — E(\im, Xm))■ An important special case 
is im = i'm^ which arises in the quasi-static (quantum 
adiabatic) limit, as well as if the energy eigenbasis is con¬ 
stant and only the energy eigenvalues change; this can be 
termed the discrete-classical case. 

2. Thermalizations map i'm to im+i, cost no 

work, and preserve the Hamiltonian: \im,Xm-i-i) —t 
|zm+i, Am-i-i)- For notational simplicity let us label this 
as |z) —)■ |j) with energy Ei —)• Ej. The hopping prob¬ 
abilities respect thermal detailed balance: = 

g-hiEj-Ei)^ The energy change Ej — Ei from such a step 
is called heat, Qm- 

A trajectory is the time-sequence of energy eigen¬ 
states occupied: |io,Ao) —>■ |*o,Ai) —>■ |ii,Ai) 
l*/_i,A/) I*/, A/). 

The probability of a given trajectory is accordingly, 
assuming a Markovian heat bath, 

pitraj) =p(|io,Ao))x 

/ 

n ^ Km,’ ^ 

m—0 

Am-l-l) \imj-l, Xm-i-1 ))• (1) 


A trajectory’s inverse is the reverse of the sequence. The 
inverse corresponds, in the discrete-classical case, to the 
Hamiltonian changes running in reverse, from Xf to Aq, 
and to the same thermalizations as in the forward proto¬ 
col, with the sequence exactly inverted. This process is 
termed the reverse process. Beyond the discrete-classical 
case, the unitary associated with the inverse process 
Hamiltonian change is defined such that p(|*(„, Am+i) —> 
\im,Xm)) = p{\im,Xm) \i'mi ^ra+i))■ Our results will 
hold under that condition. There are at least two ways 
of satisfying that condition: (i) Simply let the unitary 
of the corresponding elementary step in the reverse pro¬ 
cess be U~^, where U is that of the forwards process, (ii) 
apply a suitable ‘time-reversal’ operator 0 to all states 
and operators involved, as in |21j . The reverse trajec¬ 
tory is then the reverse sequence of the time-reversed 
energy eigenstates: 0|i/, A/)...0|io, Aq), with the con¬ 
dition p{jim-i Xm-\-l) t \imiXm)) — pi,^\im.-i Xm) t 
Q\i'm, Am+i)) being satisfied, as time reversal implies tak¬ 
ing the complex conjugate of the states, in a preferred 
basis, and the transpose of the time-evolution in the 
same basis: U —)■ U'^. The condition is thus satisfied 
as (&|t7|a) = ((a|C7t|5))* = {a\*U^\b)*. 

A given trajectory has some work cost w = 
in line with the definition of the Hamiltonian-changing 
steps. The inverse trajectory has work cost —w. A given 
protocol on a given initial state induces some probability 
distribution over trajectories, with an associated prob¬ 
ability distribution over work p{w). The forwards and 
reverse protocol gives rise to Pfwd(w^) and p,.ev{—w) re¬ 
spectively. 

If the initial density matrix of the forwards pro¬ 
cess and reverse processes are both thermal, i.e. 
exp—{(3H{Xo))/ZQ and exp—{l3H{Xf))/Zf respectively. 
Crooks’ Theorem holds pT| : 


Pfwdjw) 
Prev ( ^) 


z 

-^expiffw). 

^0 


( 2 ) 


(To derive it take the ratio of Eq. and the correspond¬ 
ing reverse trajectory expression. Apply thermal detailed 
balance and the equality of reverse hopping probabilities 
for the Hamiltonian-changing steps. Sum over trajec¬ 
tories with the same w, and note that the reverse of a 
trajectory has the same work up to a minus sign). 
Worst-case work —The central object of interest is the 
worst-case work 


:= max{w : p{w) > 0}, 

also known as the guaranteed work [J. In practice this 
may be realised by some very unlikely trajectory, and it 
is then natural to consider the worst-case work of some 
subset of trajectories T: 

wf)- := max{w : p{w) > 0 and traj S T}. 

Equality for worst-case work — Consider an initial 
state pq, and a protocol of thermalizations and Hamilto¬ 
nian changes with initial and final Hamiltonians H{Xo) 
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and H{Xf) respectively. This induces a work probability 
distribution p{w) and an associated w^. We shall derive 
an equality of the form = penalty - optimum. 

We consider initial states of form po = 
J 2 iPi\'^ 0 i Ao)(* 0 ) Ao|, i.e. diagonal in the energy eigenbasis 
though not necessarily thermal (energy coherence may 
still arise during the protocol). We take Pi ^ 0. This is 
because we wish to avoid divergences from dividing by 
Pi- (See |22j for an alternative way of approaching this 
divergence problem). 

To apply Crooks’ Theorem (Eq. here, even though 
the initial state is not assumed to be thermal, our ap¬ 
proach is as follows. Note that if a state is not thermal, 
e.g. if one has a degenerate two-level system the ther¬ 
mal state is 7 = 1/2|0)(0| -I- 1/2|1)(1|, but if one instead 
had po = 2/3|0)(0| -|- 1/3|1)(1|, then this scenario has 
the same worst case work as 7 . This follows because the 
set of trajectories with non-zero probability is the same 
in both cases, as can be seen from Eqj^ which gives the 
probability of a trajectory. Given a po we will then find 
a corresponding thermal state which has the same worst- 
case work and apply Crooks’ Theorem to that. 

An important practical consideration which makes this 
more subtle is that some pt may be negligible. It is then 
natural to exclude trajectories starting in those states 
when calculating the worst-case work. We therefore di¬ 
vide the initial energy eigenstates into two sets: one set 
which is the one of interest: i^in and the rest which we 
call ^ouT, corresponding to those we shall exclude when 
calculating the worst-case work. The probability of being 
in fouT is given by 


the calculated work cost, as can be seen in the Z being 
smaller. 

In this scenario with 7 as the initial state and the £out 
levels lifted the protocol is the same as in the actual sce¬ 
nario, except that initially the i^out are lowered down 
to the levels of the actual Hamiltonian of interest. The 
worst-case work of this scenario is called nP. We show 
(see Methods) that under a mild additional assumption 
that the worst-case work is bounded from below, 

= (4) 


as desired. 

To get vP from Crooks’ Theorem (Eq. we fol¬ 
low [15]. Take the initial state of the forwards process 
to be po = 7 ; and the initial state of the reverse process 
as 7 = /Zf. Consider the equality of Crooks’ 

Theorem (for values of w such that Pfwd('w) > 0) and se¬ 
lect the value for w which maximises the LHS (and thus 
the RHS) [T5] : 


Pfwd(w) 
max —7 -r 

Prev(-W) 




max ■ 




/g/S’' 


The RHS is monotonic in w, so maximizing the RHS 
over the support of Pfwd('w) leads to the maximum re¬ 
value re°. Taking the logarithm and recalling the D^o 
definition yields m 

j3w° = i:>oo(Pfwd(w)||prev(-W')) “ log {zf jZ^ . (5) 


p(OUT) = ^ Tr(|io, Ao)(ro, AoIpo). 

bo Ao) gSout 


We dehne Tin as the set of possible (meaning p > 0) 
trajectories beginning in fiN and similarly Tout as the 
set of possible trajectories beginning in fouT- Recall that 
each trajectory has some work value associated with it. 
We call the worst-case work of Tin, this cannot be 
worse than the worst-case over all trajectories: 

Now we design an associated thermal state to yield 
the same worst-case work as po, i.e. and later show 
this to be indeed be the case under an additional mild 
assumption. We dehne it as 


7= E 


o-fiEi 


bo 




1*0,Ao)(*o,Ao|-f ^ pi|io,Ao)(*o,Aol, 


bo AoiefoUT 


changing the energies of £out to new ones, Ei, such that 
Pi = exp[—j5Ei)/Z, and leaving the other energy levels 
the same. Our dehnition implies that 


Z = 


V p-PEi 

_^]ioAo)G£iN^____ 

1 - p(OUT) 


(3) 


This partition function differs from that of the actual 
Hamiltonian H{Xo). Ignoring the £out levels helps lower 


Main result — Combining Eqj^and Eq. j^we thus have 

/3w°n = iAoo(Pfwd(*c)||Prev(-w)) “ log {zf. (6) 

Thus the worst case work of the trajectories of interest 
is this equal to (kT times) a relative entropy minus 
(the logarithm) of two partition functions, one of which 
encodes information about how many of the initial energy 
eigenstates have negligible occupation probability. 
Discussion —Equation has the form 

fivP = penalty - optimum. 

The penalty is given by the difference between the for¬ 
ward and reverse distributions, quantified by Deo- The 
optimum one can hope for, with a given initial state and 
given initial and final Hamiltonian, is to set the penalty 
to 0 (as relative entropies are non-negative), which leaves 
— log^Zj/^y This term is made more negative the 

smaller the support of p is and the lower the final energies 
are relative to the initial ones. To illustrate the notation 
used, a very simple example of applying the formula is 
given in Fig. 

We now consider the optimum term in two important 
special cases where the single-shot entropy of the initial 
state emerges: (i) If p{OUT) 0 and iT(Ao) = TI(A/), 
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Figure 2: A very simple example of how to apply the 
formula. The energy levels are black lines and the occupation 
probabilities are indicated by the height of the blue rectangles. 
The forwards protocol here is to lift the second level from 
— SE to 0. The reverse is to lower it back. Suppose for 
concreteness that po = 0.9|0, AoXO, Ao| + 0.1|1, AoXl, Ao|. 
This table describes the two possible trajectories, their 


work costs and probabilities (reverse trajectories in brackets). 



traj 

traj set 

work w 

prob 

traj 1 

|0,Ao) |0,A/) 

£ Tin 

0 (0) 

0.9 (0.5) 

traj 2 

|iWo) |1W/) 

£ Tout 

5E {-SE) 

0.1 (0.5) 


/3u)?jv = 0; Doo = log(2(0.9)); log = log(Yy|^). Thus 


the equality is in this case: 0 = log2(0.9) — log 2(0.9). 


then log(Z//z) = -logI:^Gsupp(po)e = 

Z9o(po||7)- Thus in this case the equality has the form 

/3w° = Doo - Dq. 

(ii) lipiOUT) ^ 0 andTr(Ao) = H{Xf) = 0, Z9o(po||7) = 
logd-Sma.APo) (noting 7 = l/d and recalling 5'max(p) := 
So{p) ■= log(rank(p)). This recovers the known results 
from [S 0 IS] that these are optimal in the respective 
cases. (In the more general case where p{OUT) is finite 
one recovers the smooth relative entropy as the optimal 
quantity-see Methods). The message is that it is the 
max entropy iSmax which determines the optimal worst- 
case work, rather than the von Neumann entropy. If 
one defines thermodynamic entropy in terms of optimally 
extractable worst-case work, it is the max entropy which 
should be used. 

To make the connection to physics clear, we apply the 
results to a recent realization of a Szilard engine with an 
electron box [TS1118) . A great advantage with using this 
trajectories model from the fluctuation theorem approach 
is that it allows the application of single-shot results to 
such experiments. We described the set-up in in Figurej^ 
and in the Methods we analyse what controls the penalty 
term Dao in this scenario. 

As described in the trajectories section, these results 
also apply if the evolution includes unitaries that cre¬ 
ate energy coherences. One might think that coherences 
will always worsen the worst-case work or its probabil¬ 
ity. As a counter-example according to this trajectory 
model, suppose H{Xq) = 0; po = 1/3|0)(0| -I-2/3|l)(l|, 
and H{Xf) = SE\i){i\. If the energy eigenstates stay the 
same throughout such that |t) = |1) the worst-case work 
is 6E and it has probability 2/3 (even if the shift is done 
quickly). If instead the Hamiltonian eigenstates change 
such that |0) —)■ |-|-), and |1) —>• |f) = |—) then the worst- 



Figure 3: A schematic of an “electron box” (D) coupled to a 
metallic electrode (R) via tunnelling (with rate T) and the 
capacitor with capacitance Cj, and to the gate electrode via 
the capacitor with Cg. The gate voltage Vg controls the 
number of excess electrons on the electron box, which at low 
temperatures is restricted to two possible values and serves 
as a logical basis |0) and |1) for a qubit. Namely, it tunes the 
relative energy by H oc —CgVg\l){l\. The electrode R plays 
the role of a heat bath, where the tunnelling in/out of the box 
D corresponds to thermal excitation/relaxation. 
Experimentally, the work and heat can be measured by 
probing the charge on D in real time with a single-electron 
transistor next to D (not shown in the figure) as 
demonstrated in Refs. mni- In the Szilard engine 
protocol, H{Xo) = H{Xf) = 0, po — |0){0|. We thus set 
7 = |0)(0|. The Do term then becomes In2, so that 
Wjjg = penalty - optimum = feTDoo — fcTln2. In the methods 
we derive a master equation for the characteristic function 
Z(^) = (e^™) of the work distribution function P{w). This 
new master equation allows us to calculate efficiently the 
characteristic function, the work distribution itself 
(Figure^, and a bound for D^o (see the Methods). 




I3W 


Figure j: Work distributions calculated analytically for the 
forward (a) and reverse (b) process on an electron box. The 
two levels initially have the same energy, and one of them is 
lifted linearly up to IiQksT and back to 0. The values of the 
zero-energy tunnelling rate To and the operation time t are 
set such that Vot/sc = ksT, where Ec is the relaxation time 
of the metallic electrode (charge reservoir). 


case work is still SE corresponding to outcome |—) of 
the final energy measurement. However the probability 
of this can be as low as 1/2 {ii H is changed suddenly 
p{\—)) = Tr(po|—)(—I) = 1/2). This shows that the 
probability of the worst case can actually be improved 
(lowered) by coherence due to suddenly changing then 
Hamiltonian, though at the cost of randomising the work 
distribution. 

In the Methods we go further and consider a smaller 
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subset of trajectories, cutting away also trajectories that 
start in a likely initial state but nevertheless have low 
probability. We describe the continuous time limit, and 
the electron box scenario in detail. 

Summary and outlook —We showed that in any pro¬ 
tocol with a time-varying Hamiltonian and thermaliza- 
tions, the worst-case work takes the form of “penalty - 
optimum”. The model we used could be generalised in 
various ways, including non-Markovian baths and baths 
that decohere in other bases than the energy basis. It is 
also important to find more bounds for the penalty term 
in terms of controllable parameters. 

Note added: Similar results were obtained indepen¬ 
dently by Salek and Wiesner, using a different set-up 
and different starting assumptions, in: Fluctuations in 


Single-Shot e-deterministic Work Extractions. 
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Appendix A: Properties of 7 and associated protocol 


For a given initial state p = initial energy eigenvalues Ei, the associated thermal state is defined 

as 7 = /^\^){^\y where Ei = Ei for |f) G ^in, but for |i) G Sqvt, Ei is chosen such that = pi. 

Physically, this implies replacing the energy levels with small occupation probability pi by much higher energy levels 
such that their thermal occupation probability is as small as pi. The Hamiltonian associated with 7 is accordingly 
H Ein ^iV){A + X^ouT The normalising factor \s Z = These definitions imply that 


V p-PE, 

Zi\i)eSm __ 

l-p{OUT) 


(Al) 


Apart from the given actual protocol, we also design a ^-protocol such that it gives the same worst-case work in 
the case of 7 as the initial state. We define the ^-protocol as beginning with E[, then lowering the OUT levels back 
to Ei, i.e. setting H ^ E[. After that it is the same as the actual protocol. We call the ^-protocol applied to 7 “the 
^-scenario.” _ _ 

In the ^-scenario we similarly have Tin and Touti and The following holds: 

W°N = ^IN) (A2) 


i.e. the worst-case work is the same in the ^-scenario as in the actual scenario, for the Tin subset of trajectories. This 
is because the protocol is defined such that the added initial step in the ^-scenario only involves the OUT levels. The 
set of possible work values are the same in Tin and Tin- 

We now make the following mild restriction on protocols allowed: 

w°N = (A3) 


We say this is mild, because the trajectories Tout have an extra work gain relative to their sister trajectories in Tout 
following from their initial lowering. This gain tends to infinity as p{OUT) —>■ 0. The restriction of equation (A3) 


then means that the negative infinite work from a Tout trajectory is not a worse work cost than that from a Tin 
trajectory. Combining Eqs|A2| and |A3| gives the desired expression used in the main body: 


Win = ^ 


Appendix B: Smooth relative entropy 

As noted in the main body, the optimum term reduces to a relative entropy in a special case. If H{Xq) = TI(A/), 
log (^/^) = -logE*Gs«pp(po) = T'o(poI| 7)- Moreover if iJ(Ao) = TI(A/) = 0, Do{po\\'y) = logd- S'inax(po) 

(noting 7 = 1/d). This recovers the known results from [3l [U [ 6 ] that these are optimal in the respective cases. If 
p{OUT) defined above is not necessarily zero, this optimal term depends on which levels are chosen to be in Sout- 
If one chooses the best cut between IN and OUT, in the sense of minimising Z and thus the worst-case work, the 
optimal term becomes in those cases Dq{po\\j) := minTIo(p^|| 7 ) such that d{po,p') < e where d is the trace distance 
(this is called the smooth relative entropy). The interpretation is that the optimal worst-case work allowing for an 
error tolerance of e = p{OUT) is fcTiA5(po| I 7 ), consistent with [3l[5l[6]. 


Appendix C: Cutting the work-tail, as well as the state-tail 

There can actually be (sets of) trajectories which are unlikely even if the initial state of the trajectory is likely, as 
the hopping probability may be low. For example if one lifts one level towards a very high value whilst thermalizing, 
there is one trajectory corresponding to staying in that level throughout, which would then be the one that gives 
the worst-case work. However if this is very unlikely one would wish to ignore such a trajectory when stating the 
worst-case work. In this section we show a way to do that, by not only cutting off a part of the initial state as 
previously, but also a part of the work distribution. This gives a different penalty term—lower in general—in the 
equality for the worst-case work. 

Proof overview —We shall again take the initial density matrix to have the form po = E^i PiNoi Ao)(* 0 ) Ao|, 
not necessarily a thermal state. Then a sequence of Hamiltonian changes and thermalizations as described above is 
applied. This induces some work probability distribution and some worst-case work for the trajectories of interest. 
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Trajectories 



Figure 5: Depiction of the trajectories of interest. We shall ignore trajectories that have undesirable, very unlikely work values 
(that are in the work-tail) and trajectories that start in very unlikely energy eigenstates (that start in the p-tail). 


The argument is split in two. First, we define a set of trajectories of interest: Some trajectories are unlikely enough 
to be ignorable. We derive the worst-case work for that set. Next, we consider the probability that some trajectory 
is in that set. Combining these two parts gives our new equality for worst-case work. 

The set of trajectories of interest —We wish to ignore unlikely trajectories. We identify a set of trajectories of 
interest, defined as excluding trajectories of two types: 

1. po-tail trajectories: These are those which are called Tin above, i.e. trajectories which start in Sjn- We now 
call them po-tail trajectories as using IN risks generating confusion because of the second type of cut we shall make 
on the set of trajectories. 

2. Work-tail trajectories: We also ignore trajectories associated with the worst work values, if those values are 
sufficiently improbable. This ignoring amounts to cutting off the worst-case tail of the work probability distribution. 
To simplify the proof, we define this tail in terms of the work probability distribution of the fictional thermal state 7 . 
By “the work-tail,” we mean the set of trajectories associated with the following work values w: If the initial state 
is 7 , there is an associated work probability distribution piwd{w) for the given protocol, and an associated worst-case 
work w'^. The work tail trajectories are by definition those with work cost w > w'^ . Since the actual initial state po 
may differ from 7 , the probability that some trajectory begins in the work tail does not necessarily equal e. 

These sets are depicted in Fig. We shall call the worst-case work in the set of interest 

The worst-case work in that set —We now derive the worst-case work in that set of trajectories, i.e. we 
maximise the work cost w over that set of interest. We shall for the first part draw inspiration from an argument 
in m concerning scenarios where Crooks’ Theorem holds. Take the initial state of the forwards process to be po = 7 ; 
and the initial state of the reverse process as 7 = I'^S- 

Maximize Crooks’ Theorem over the support ofpfwd(ic) [TK] : 


max 


Pfwd(w) 

Prev{-w) 


max — e 
Z 




The RHS is monotonic in w, so maximizing the RHS over the support of Pfwd{w) leads to the maximum rc-value 
Taking the logarithm and recalling the definition yields m, 




^00 {_Pfwd (^) I \P rev (-w)) - log 



Now, we cut off the work tail by defining a cut-off probability distribution Pf,^^{w) := 0, if ic < and 
otherwise wherein denotes the work guaranteed up to probability e if 7 is the initial state. [Dividing by (1 — e) 
normalizes the distribution.] For work values outside the work tail, Crooks’ Theorem can be reformulated as 

Prev( '^) Z 








Since the RHS is monotonic. 


Previ-W) \Zj 

wherein the maximization is over the support of Pf^^- Taking the logarithm and rearranging yields 

/3w" = £)oo(pLdMI|Prev(-'U;)) + log(l-e) - log 


The LHS is the worst-case work in the set of trajectories of interest. 

Probability that a trajectory is in the set of interest —The trajectories of interest are effectively the possible 
trajectories. To make precise what is meant by “effective,” we bound the probability of not being in that set. 

Consider a trajectory followed by a system initialized to po- The probability that the trajectory lies outside the 
set of interest is bounded by p(po~tail) -|-p(work—tail), as shown in Fig.[^ p(po—tail), defined via po and the choice 
of effective support, is specified by input parameters. p(work—tail) denotes the probability that the trajectory is in 
the set associated with a worse work cost than w'^ (the work guaranteed up to probability e not to be exceeded, if 
the initial state is 7 ). p(work—tail) does not necessarily equal e for an arbitrary po. As p(work—tail) is not an input 
parameter, we wish to bound it with input parameters. 

Let us drop the subscript “fwd” and refer simply to p{w). The weight p{w > x) in the actual work tail with po 
cannot differ arbitrarily from the weight p{w > x) in the work tail associated with 7 : 


\p{w > x) — p{w > x)\ < d{p{w),p{w)). 

This bound follows from the definition of the variation distance d, which equals the trace distance between diagonal 
states.^ 

The variation distance d is contractive under stochastic matrices, because the trace distance is contractive under 
completely positive trace-preserving (CPTP) maps. We note that the work distribution is the result of a stochastic 
matrix acting on the probability distribution over initial energy eigenstates. Let us now in this paragraph for conve¬ 
nience use Dirac notation for classical probability vectors, representing a probability distribution p{w) as {w\p). The 
work distribution comes from the stochastic matrix \Pj){j\ rnapping a state |po) to a work distribution, wherein 
j labels projectors onto H{Xq) eigenstates, \pj) labels the work distribution when starting with an initial state \j) 
(i.e. Pj{w) = {w\pj)), and |po) = J2jdj\j)- To'' example, if there are two possible eigenstates, we can write |po) = 
9i|l) + 92I2) = {qi 92)^, and the resulting work distribution p{w) = ((w|pi)(l| -I- (ic|p2)( 2 |)|po) = qipi{w) + q2P2{w). 

Thus, 


\p{w > x) —p{w > cc)! < d{p{w),p{w)) < d(po, 7 )Va;. 

For some x = x' , by definition, p{w > x') = p(work—tail) = e, and p(work—tail) := p(w > x'). Thus 

p(work—tail) < d{pQ,^) + e. 

Main result, also cutting work tail —We conclude that the worst-case work from the trajectories of interest, 
w^N IN respects 


l3w%jN = D^{pl^^{w)\\prev{-w)) +\og{l-e) - \ogZ/Z. 

The probability that the trajectory is not in the set of interest is upper bounded by p{p-tail) 
p{p-tail) + d{pi,j) + e. 


(Cl) 

■ p{work-tail) < 


Appendix D: Continuous time versus discrete time 

We have mainly focused on the discrete-time protocol. Experimental realizations of thermodynamic protocol are 
often described by a continuous master equation. Here, we show that the discrete protocol leads to a master equation 
in the continuum model and vice versa. In this section we restrict ourselves to scenarios without energy coherences, 
i.e. the discrete-classical case. 


See, e.g., Sec. 2 in http://people.csail.mit.edu/costis/6896spll/lec3s.pdf. 


1 
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1. From discrete to continuous 

We consider a discrete sequence of times, tm = to + m dt (in = 0,1,2---), and the sequence Am = ^(tm) of 
values of the external parameter. As the waiting time decreases (dt —>■ 0), the transition probability p(\i, X(t),t) —>■ 
\j, X(t + dt), t + dt)) due to thermalization should vanish. To first order, it behaves as 

p(\i, X(t),t) —7> \j, X(t + dt),t + dt)) « Sij +Ti^j(t)dt + 0(dt^). (Dl) 

The transition rate Ti^j(t) is a possibly complicated function of instantaneous energy levels E(\i,X(t),t)). However, 
the transition rates inherit the condition 


from detailed balance and the condition 


Y.r,^j(t) = o (D3) 

3 

from probability conservation. The occupation probability is 

p(\j,X(t^ dt),t + dt)) = 'Y^p(i,\X(t),t))p(\i,X(t),t) \j,X(t + dt),t + dt)) 

~ p{\h Ht),t)) + W d't - \Kt),'t))^J^^i't) dt. 

i i 

If the occupation probability is a smooth function of time, the master equation 

j^p(\jA(t + dt),t + dt)) = |A(t),t))r,^j(f) - |A(<),t))rj^,(f) (D4) 

i i 

follows. The equivalence is further illustrated in Appendix]^ in the example of an electron box. 


2. From continuous to discrete 

Going in the other direction, we now show explicitly how the discrete-time model can be derived from a physical 
master equation. Consider a two-level system that has a state |0), kept at zero energy, and a state |1) whose energy 
lTw(t) changes. The Hamiltonian is H(t) = /ia;(f)|l)(l|, and the system interacts with a temperature-T heat bath. 
In |23| . a master equation for the density matrix p(t) was derived for a such system. In the present case, the master 
equation is 


p(t) = -i[H(t),p(t)\+C(t)p(t) (D5) 

C(t)p = Td(uj(t))([nth(uj(t)) + l]{[<T_,p(t)a+]+h.c.}+nth(uj(t)){[a+,p(t)a_] + h.c.}). (D6) 


The heat bath’s thermal photon number nth(w) = — 1) ^ depends on time because the upper level shifts. d(u}) 

is the dimensionless heat-bath density of states; T deno tes a rate assumed to be constant; (t_ = |0 )(1| denotes the 
usual lowering operator; and ct+ = cr^l. Equation (D5) has the form of the usual Lindblad master equation, but 
the Lindblad operator depends on time. The dependence arises only from the level spacing’s time dependence. The 
Hamiltonian part contains the Lamb shift. 

In the derivation of Eq. (D5) one assumes, as usual, weak coupling to the heat bath, the Markovian approximation, 
and the rotating-wave approximation. One also assumes that the adiabatic approximation holds, i.e. the system 
always remains in its time-local energy eigenstates when the interaction with the heat bath is ignored. This condition 
is always fullfilled under the assumption of vanishing energy coherences at all times that we made in this section. 
Indeed, the part of (D5) pertaining to the diagonal elements of p(t) can be derived without the adiabatic assumption 

m- 

We now consider discrete times := nAt, n = 0,..., N, with u!(t) constant during the time intervals At, ujn '■= 
uj(tn). Restricting ourselves to changes of the Hamiltonian that only involve its spectrum, H(t) and C(t) are constant 
during a given time interval. 
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Consider first the Hamiltonian changes. Heisenberg’s equation of motion for the system-and-bath composite 
implies that p(t) has a finite jump when the Hamiltonian has a finite jump. Therefore, p{t) is continuous when 
the Hamiltonian has a finite jump. Hence for finite Hamiltonian changes during a time St, the system-and-bath 
composite’s density matrix is unchanged in the limit as St —>■ 0. Hence the system’s reduced density matrix is 
unchanged during the instantaneous shift of energy levels. As for the relaxation process, the initial thermal state is 
described in terms of occupation probabilities for the n-th level. The evolution during the relaxation process is 
given by p{t) = e^*p(0), where T is a matrix that connects the diagonal matrix elements of p in the master equation 
(D5), pnn = J2m"^'nrnPmm- The transition rates Tnm inherit detailed balance from the rates appearing in the master 


equation, i.e. Tij = e Tji. Expanding * into a power series, one realizes that for each power of T detailed 

balance holds, i.e. (T^)ij = for all k G N, and therefore also for We thus have derived, from 

a physical model of a system that is coupled to a heat bath and whose energy levels are piece-wise-constant, the 
discrete-time model considered in the paper. 


To illustrate this let us consider a two level system: Expressing p{t) = po(^)|0)(0| -I- [1 
differential equation for po{t), 

Po{t) + g{uj{t))p{t) = + rd{uj{t)), 

wherein g{uj{t)) := 2r[2nth(<j-'(t)) -I- 1]. This equation has the general solution 

Po{t) = ^po(O) J dtid{uj(ti))G{ti)^ /Git) + i, 

wherein G{t) := e^o gGiti)}dti ^ integrals in Eq. (D8) can be calculated analytically: 

Po{t) = -hpo,th 


Po(i)]|l)(l|) we obtain a 
(D7) 

(D8) 

(D9) 


wherein po.th := l/(e“^^‘^-|-l) denotes the ground state’s thermal occupation. For large times, the memory of the initial 
state is lost, and the system relaxes towards thermal equilibrium. From Eq. ( |D9[ ), we obtain the transition probabili¬ 
ties during relaxation over the time interval between nAt and {n + l)At: p(|0„,a;„) —)■ |0„+i,a;„)) = po(At)|pp(o)=i, 

P(|0ni^n) ^ |ln-t-l;^n)) — 1 P(|0n;^n) ^ |0n-t-l; ^n)); 7*(|ln;^n) ^ |9n+l;^n)) — PO (At) |pQ fo)—0 j and 

p(|l„,a;„) —>■ |l„_|_i,a;„)) = 1 — p(|l„,a;„) —>■ |0„+i,w„)). These transition probabilities obey detailed balance. As 
they remain unchanged by the inclusion of an instantaneous Hamiltonian change at the end of each time interval, we 
have p(|i„,w„) |j„+i,a;„)) = p(|j„,a;„) -)► |j„+i, w„+i)) for i,j S {0,1}. 


Appendix E: Application to solid-state system: Electron box 

To demonstrate the physical relevance of our results, we take a realistic example, the so-called electron box, 
and apply our results to it. We first derive a time-local master equation for the level-occupation probabilities in 
Appendix |E 1[ As shown in Appendix it is equivalent to the discrete-time trajectory model discussed in the main 
text. Then the work distribution functions are analyzed numerically in Appendix |E 2| and analytically in Appendix |E 3| 
Finally, we provide an upper bound of the penalty term Dqo, which reveals the direct physical relevance of our results. 


1. Theoretical model and its justification 

We consider the type of system in [THUTH] . Following a semiclassical theory (known as “the orthodox theory”) 
such as in [25], we derive a master equation and illustrate the work fluctuations. While a more complete quantum 
description is possible [e.g.,[53], the semiclassical approach is useful for interpreting and identifying work and heat, 
which are often ambiguous. 

The system (Fig. consists of a large metallic electrode R that serves as a charge reservoir, a small metallic island 
(or quantum dot) D, and a gate electrode. The island D is coupled only capacitively to the gate electrode but couples 
to the reservoir R capacitively and via tunnelling. The Hamiltonian has four parts: H = Hr -\- Hjj -|- He + Ht- The 
first two terms, 

HR = '^£kc\ck and Hp = y^Egdldg, 

k q 


(El) 
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r 



describe the non-interacting parts of the electrode R and the island D. Here, (dj) creates an electron with 
momentum hk {hq) and energy £k (Sq)- The single-particle dispersions £k and Sq form continua of energy levels. He 
signifies the Coulomb interaction among electrons confined in the island. Describing it within the capacitor model is 
sufficient: 


He 


01 . ^ 

2Cj 2Cg ’ 


(E2) 


wherein Cj and Cg denote the junction and gate capacitances, and Qj and Qg are equilibrium charges stored on 
them. One can find that 


Qj = C(Vg - Ne/Cg) , (E3a) 

Qg = C(.Vg + He/Cj) , (E3b) 

wherein C := CgCj/iCg+Cj) is the system’s effective capacitance and N = is the number of excess electrons 

on the island D. He can thus be rewritten as 


He = EeN"^ + (E4) 

wherein Ee '■= e^/2(Cg -|- Cj) is the single-electron charging energy, one of the largest energy scales of the system. 
Finally, the tunnelling of electrons between R and D is described by 

Ht = '12 ^kqc\dq + h.c., (E5) 

kq 


wherein tkq is the tunnelling amplitude. For common metals, which have wide conduction bands, tkq = td is indepen¬ 
dent of the momenta (or energy). 

We are primarily interested in the macroscopic variable N but not in the microscopic degrees of freedom Ck and 
dq, whose dynamics is typically much faster. One can thus integrate out Ck and dq to get the effective Hamiltonian 
expressed only in terms of N . In the semiclassical approach, this can be achieved by considering the energy that an 
electron gains by tunnelling. 

Suppose that an electron tunnels into the island D from the reservoir R. This will change the charge Qj — )■ Qj — e 
and the excess number of electrons N —> TV -|- 1. This new charge configuration, right after the tunnelling, is 
redistributed quickly to a new equilibrium configuration 

Q'j = C[Vg-{N+l)e/Cg\ 

Qg = C[Vg + (TV + l)e/Cj\ 

by the gate voltage source. The voltage source moves the amount of charge 

AQ := Q'j - {Qg -e) = eCg/{Cg + Cj) (E7) 

through the transmission line from the junction interface to the gate capacitor by doing the amount 


(E6a) 

(E6b) 


W = VgAQ = eVgCgliCg + C j) 


(E8) 
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of work on the system. Therefore, the electron’s overall energy gain /S.E is given by the work W minus the change in 
the electrostatic energy: 


/\E = Ec [2CgVg/e - (2N + 1)] . (E9) 

As this energy gain comes from the transition N ^ N + 1, the effective Hamiltonian for the macroscopic variable N 
can be regarded as 


ifeff = Ec{N^ - 2NNg), 


(ElO) 


wherein Ng := CgVg/e. Recall that the second term comes from the work done on the system by the voltage source. 

The remaining effect of the microscopic degrees of freedom that have been removed from the macroscopic effective 
model is to fluctuate N randomly. As the transition —>■ 7V± 1 is associated with tunnelling of an electron into/from 
the island, the transition rate can be obtained from Fermi’s Golden Rule: 


r(AA) 


2Tr\td\'^PRPD AE 

n + 1 ’ 


(Ell) 


wherein pR and pu are the density of states of R and D, respectively, and 


AA = H,a{N ± 1) - H,s{N). 


(E12) 


Finally, at sufficiently low temperatures {l3Ec ^ 1), higher changing levels play no rol e, and cons idering the two 
lowest levels N = 0 and A = 1 is sufficient for Ng G [0,1].^ Together with Eqs. (ElO) and (Ell), this two-level 
approximation leads to the master equation 


Po = -r+Po +r_pi 

Pi = -r_pi -f r+po, 


wherein the transition rates are 


r±(t):=r(±e(t)) and r(e) 


(E13a) 

(E13b) 


(E14) 


Here, Sc is the bath’s high-frequency cutoff (i.e., hje^ is the correlation time), and Tq is a constant that characterizes 
the strength of the coupling to the bath. To/ec is related to the material properties by Tq/ec = 2-K\td\^P rPd 1^- Note 
that the transition rates satisfy the detailed-valance relation 


+ _ g-/3e 

r(-c)-® • 


(E15) 


The time-local master equation (E13) is equivalent to the discrete-time trajectory model (see Appendix[D|. Therefore, 
the electron box is a realistic prototype system to which our results can apply. 


2. Monte Carlo simulation of the Electron Box 


We performed a Monte Carlo of simulation of an erasure protocol in the electron box set-up. Our simulation 
discretizes the protocol into time steps 5t that are small enough to justify the linear approximation that the population 
of level i evolves from time step t io t + 5t according to pi {t + St) = pi {t) + Stpi (t). Using Eqs. (E13), we can write 
a stochastic matrix acting on the probabilities: 


Po {t + St) 


■ 1 - r+st 

r_st 

■ Po {t) ■ 

Pi {t St) 


str+ 

1 - r_st 

. (^). 


(E16) 


For a two-level system which does not build up quantum coherences, a stochastic thermalizing matrix (which by 
its definition evolves all states towards the Gibbs state) has only one degree of freedom remaining once the Gibbs 


^ The model is invariant under Ng —> Ng 1, and it suffices to regard Ng S [0,1]. 
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Szilard guaranteed work extraction vs. thermalisation rate 



Figure 7; Work guaranteed to be extracted from a Szilard engine up to probability e: . A Monte Carlo simulation was used 

to predict the work from the single-electron-box. approaches fcTln2 as a function of the protocol’s speed. For smaller e, w” 

approaches from below; and for higher, from above. 


state has been chosen: the speed of a thermalization matrix. This means that all models of two-level thermalizations 
for a given Gibbs state are equivalent. For our simulation we pick the conceptually straightforward partial swap, in 
which with some probability Psw the current state of the system is exchanged with the Gibbs state, and otherwise 
it is unchanged: Mg^ap = (1 — Psw)l -f Psw|Gibbs)(ones|, where |ones) means the vector of I’s. For a Gibbs state 
associated with an energy level splitting e, we can write this explicitly as: 


.^ffswap — 


1 - 


Psw exp (-/3e) 
1 -I- exp {-Pe) 
Psw exp {-Pe) 

1 -I- exp(-/3e) 


Psw 


1 -I- exp(-/?e) 
l-exp(-/3e) 

1 - Psw: 


1 -f exp (-/3e) 


(E17) 


of the electron box: 


Equating Eq. (E16) with Eq. (E17), we can find the partial swap probability in terms of the physical parameters 


Psw (t) = coth [Pe{t) /2], (E18) 

where we have written the swap probability Psw (t) and the energy level splitting e (t) as functions of time, to stress 
that this swap probability changes as the protocol evolves. Note that the probability changes only as a function of an 
external parameter, the splitting, (as opposed to e.g., the current state) and so Crooks’ Theorem is still applicable to 
thermalizations of this type). 

In our Monte Carlo simulation in Fig. we randomly generate trajectories by picking a random initial microstate 
according to the initial state probability distribution, and then evolve the system by small steps, testing at each step 
if a swap should occur (with probability Psw), and if it does, we replacing the state with a new micro-state randomly 
chosen from the Gibbs state associated with the current Hamiltonian. By recording which microstate is occupied 
when the energy level is raised, we calculate the work cost associated with a particular trajectory. Repeated runs of 
the simulation allow us to build up a work distribution, to which the results in this paper can be applied. 


3. Analytic expression for the work distribution 

The work distribution function for an electron box can also be obtained explicitly from the master equation. 

The trajectory a{t) G {0,1} of the system is piece-wise constant, jumping discontinuously from one energy level to 
another at some random instants tj (j = 1, 2, • • •). Therefore, it is specified uniquely by the initial condition uo, the 
number J of jumps and the corresponding instants tj (j = 1,2, - • • , J). Then the probability distribution function 
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for the trajectory is given by 

J 

,tj;cro) = ]^r((-l)'"«+-^+ie(tj))exp[-S'j(ti,--- ,tj]ao)] (E19) 

i=i 

where the effective action associated with a given trajectory has been defined by 

Sj{h, ■■■,tj;ao) = Y. r((-l)'^‘>+J+ie(s)) (E20) 

j = l Jij-i 

and it is implied that to = 0 and tj+i = r. It is straightforward to check the normalization 

OO J n-J- 

fb(o-o) + En/ dtjPj{ti,--- ,tj](TQ) = l, (E21) 

7=1 j = l dtj-1 


where again it is implied that to = 0. 

The work is only done while the system is in the state a = 1, and hence the contribution to the work along the 
trajectory is given by 


Wj{ti,-- ■ ,tj;ao) = e{tj) + {ao + J mod2)ey - croCo 

1=1 


(E22) 


The work distribution function along a trajectory with J jumps reads as 

.7 


Pj{W\aQ) [ dtjPj{ti,--- ,tj-ao)d{W-Wj{ti,--- ,tj;ao)). 

7 = 1 dtj-i 


The total work distribution function can be written in a series 


PiW) = poe-^°^°'^S{W) + - IT,) + ^ ao) 

7=1 o-Q 


(E23) 


(E24) 


Pj{W) has a facto r of and at low temperatures, Pj is rapidly suppressed as J increases. 

The expression (E24) for the work distribution is essentially a perturbative expansion in Tq and converges very 
quickly for small Tg. For large Tg, however, it becomes impractical to use it for actual calculation because of its slow 
convergence. Therefore, it will be useful to devise a more general method and we examine the characteristic function 
Z{C) = (e^*") of the work distribution function P{W). We first consider the characteristic function Za{^) = 
conditioned that all trajectories start from a definite initial state ug. Regarded as a function of the operation time r, 
Z„[^\t) satisfies the master equation 


drZ„{\]T) = [rCT<T'(r) + \dre„{T)5„a'\ ^ct'(A;t) 


(E25) 


and the initial condition 




(E26) 


Compared with the original master equation ( E13[ ) for the level occupation probability, the new master equation (E25) 
for the characteristic function contains additional diagonal terms. The full characteristic function is then given by 


z(i) = Y,p,Mi). 

0-0 


(E27) 


Recall that Z{f) contains the same information as P{W). Indeed, one can calculate P{W) itself and, as shown in 
Section E4 below, a bound for Poo(Pfwd(kE)||Prev(—kT)). 

Let us now show that the work distribution in Eq. (E24) satisfies the Crooks fluctuation theorem: 


Pfwd(IT) _ Zf 
PreA-W) ~ Zo 


(E28) 
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where Zq and Zf are the partition functions for the initial and final Hamiltonian in the forward protocol, respectively. 
Given a forward ramping e{t), the reverse ramping is defined by 

= e(r - t). (E29) 

In the forward protocol, consider a trajectory a{t) characterized by the initial condition ctoj the number J of energy- 
level jumps and the jump instants tj (j = 1, 2, • • • , J). One can find a unique trajectory cr''®''(t) in the reverse protocol, 
which is defined by the initial condition 


and the flip instants 


Note that 


= aQ + J (mod 2) 


^rev _ ^ ^ 

tj — T . 




(E30) 

(E31) 

(E32) 


The effective action along the reverse trajectory is the same as that along the forward trajectory [cf. (E20)]: 

ST{tr,--- ,tT-,ar) = Sj{h,--- (E33) 

Further, the work contribution along the reverse trajectory is just the negative of that along the forward trajectory 
[cf. p^ ]: 

irr (ir, • • • , ■■ ,tj- ao). (E34) 

These observations lead to 

+Jmod2)/3e/-ao/?eo] (E35) 

and 

PT{-W] aff'') = Pj{W; (E36) 

It is then straightforward to prove Crooks’ Theorem: 

1 " 


PreA-W) = 


1 + e 




J—0 (Tn 


1 

TTe-^ 


EE' 


,—P{ao + J mod 2)eo 


J=0 cro 

-pw °° 

EE g-o-o/3eo Pj(W; (To) 


1 -k e-P®/ 

_ 


J—0 (To 

1 -|- 

= '"''”'rTF:E7 


4. Upper bound of Doo term 

Recall the Markov inequality, for a non-negative random variable X\ 

p{X > a) < {X) ja. 

This is derived by noting that there cannot be too much probability of having a value much greater than the average, 
or else the average would have to be greater. In our case it reads 

p{w > u;®) := e < {w)lw'^. 
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foT/^Ec 



Figure 8: The w‘ and its upper bounds {w}/e (which in turn bound Doo) for different values of e. 

and {w)/e. (b) The relative tightness. 


(a) Individual plots of 


Thus 


w'^ < {w)/e. 


Recalling the main result, and rearranging it 

^oo(p^wd(^)lbrev(-w))=/3w" - log(l - c) + log Z/Z, (E37) 

we now have 

^oo(Pfwd(^i')lbrev(-w))</3(w)/e- log(l-e) + logZ/Z. (E38) 

(here \ogZ/Z = log2). One has only to find the upper bound of {w). One can do this most easily by means of the 
characteristic function (e''*’"), which bounds {w) due to the convexity. This has been illustrated in Fig. 














